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Abstract 
Gestures are associated with powerful forms of understanding; however, their causative role in 
mathematics reasoning is less clear. We inhibit college students’ gestures by restraining their 
hands, and examine the impact on language, recall, intuition, and mathematical justifications of 
geometric conjectures. We test four mutually exclusive hypotheses: (1) gestures are facilitative, 
through cognitive off-loading, verbal support, or transduction, (2) gestures are not facilitative, 
but being inhibited from gesturing increases cognitive demands, (3) gestures are a byproduct of 
reasoning processes that would take place with or without the gestures’ overt presence, and (4) 
gestures can cause learners to focus on concrete, salient representations, inhibiting abstraction. 
We find support for the third hypothesis, concluding that learners making or being inhibited from 
making gestures does not seem to impact their problem-solving, cognitive, or language 
processes. This suggests that being unable to overtly perform personally-generated gestures is 
not a hindrance to learners; however this would not necessarily hold for directed or structured 


gestures. 


Keywords: gesture; proof; embodied cognition; gesture inhibition; geometry 


RESTRICTING HAND GESTURES 3 


Does Restricting Hand Gestures Impair Mathematical Reasoning? 
1. Introduction 

Embodied views on cognition posit that all mental processes are rooted in perceptual and 
motor systems (Wilson, 2002) and that mental representations of objects are experiential and 
multimodal in nature (Barsalou, 2008). Embodied approaches to teaching mathematics have 
become a particularly important area of study, challenging a tradition where mathematics is seen 
as disconnected from the body, action, and perception (Lakoff & Nufiez, 2000). Geometry is an 
important area for embodiment investigations because of its spatial, dynamic relations and the 
complex interplay between language, symbols, and action (Nathan & Walkington, 2017). 

One way in which mathematical reasoning is embodied is through gesture. We define 
gesture as personally-generated movements of the body that people use during reasoning about 
or communication of mathematical ideas. This follows McNeill (1992), who defines gestures as 
“all visible movements by the speaker” (p. 78) that do not involve object manipulation or actions 
like stroking one’s hair. Under this definition, tracing a circle in the air while thinking about a 
geometric problem involving a circle would be a gesture, as would tilting the head to indicate the 
movement of an object translated on a Cartesian plane. Here we consider only gestures that are 
mathematical in nature — gestures that relate to mathematical reasoning, rather than gestures 
given for emphasis or to show other social cues (like nodding). 

Learners’ tendencies to produce mathematical gestures has been shown to predict 
learning and performance in mathematics (Cook & Goldin-Meadow, 2006), and students who 
gesture more and gesture in particular ways communicate more accurate geometry proofs 
(Nathan et al., 2014; Nathan & Walkington, 2017; Pier et al., 2019). Dynamic gestures (Garcia & 


Infante, 2012; Gdksun, Goldin-Meadow, Newcombe, & Shipley, 2013) — gestures where the 
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learner is depicting a motion-based transformation of a mathematical object through multiple 
states — are strongly associated with proof performance (Pier et al., 2019). An example of a 
dynamic gesture is formulating a triangle with thumbs and forefingers, and then having the 
triangle grow and shrink to show mathematical similarity. However, it is unclear whether such 
gestures are simply a byproduct of valid mathematical reasoning, or a causative factor. In other 
words, does formulating gestures provide conceptual support that allows students to be more 
successful, or do students who tend to have stronger mathematical knowledge also tend to 
gesture more? 

One way to experimentally manipulate gesture is through gesture inhibition - physically 
inhibiting learners from being able to gesture and examining how this impacts reasoning. In the 
present study, we examine gesture inhibition for university students proving geometric 
conjectures. We examine how inhibition impacts mathematical reasoning and speech patterns, 


and how this effect is moderated by student-level characteristics. 


2. Theoretical framework 

2.1 Gesture as Simulated Action 

Hostetter and Alibali (2008) proposed the Gesture as Simulated Action (GSA) 
framework, where gestures come about as a result of perceptual and motor simulations which 
arise from mental imagery and language processing. Gestures arise when pre-motor activation is 
activated beyond a speaker’s current gesture threshold - the level of motor activation needed for 
a simulation to be expressed in overt action. This threshold can vary depending on factors such 
as the current task demands (e.g., strength of motor activation when processing spatial imagery, 


task difficulty), individual differences (e.g., level of spatial skills), and situational considerations 
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(e.g., social contexts). Hostetter and Alibali (2007) hypothesized that people with low verbal 
skills but high spatial visualization skills would gesture most, as their mental images may not be 
well-connected to verbal forms they can orally communicate. Gestures are theorized to assist 
with “packaging” ideas for speech production (Alibali, Kita, & Young, 2000; Alibali, Yeo, 
Hostetter, & Kita, 2017); therefore, gesture production may be highest when speakers who have 
difficulty with verbal skills are presented with an organizationally demanding task (Alibali et al., 
2000). A related idea is that gestures facilitate lexical retrieval — they allow learners to produce 
more fluent speech by facilitating better retrieval of words from memory (Krauss, 1998). 

GSA suggests that inhibiting gesture may increase cognitive load, which may be 
particularly detrimental when learners are confronting a challenging task. Cognitive load is 
cognitive processing demands experienced by learners due to the relationship between a task’s 
difficulty and a learner’s cognitive system (van Merriénboer & Sweller, 2005). These demands 
draw upon working memory — the learner’s cognitive capacity for in-the-moment processing, 
holding, and manipulation of information. Working memory demands can be reduced by 
utilizing external resources in the environment (e.g., writing down a phone number) through 
cognitive offloading (Risko & Gilbert, 2016). Gestures may relieve cognitive load by acting as an 
off-loading mechanism, allowing learners to bring new memory stores, such as spatial working 
memory, to bear. Thus, restricting the availability of gestures may prevent this beneficial off- 
loading from happening. Alternately, gesture inhibition itself may increase extraneous cognitive 
load, as stopping oneself from gesturing could be an effortful activity that utilizes working 
memory. This may be particularly detrimental to people with a low gesture threshold. GSA 


remains neutral on whether gesture inhibition itself utilizes cognitive resources because it is an 
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effortful task, or whether gestures function to relieve cognitive demand (Hostetter & Alibali, 
2008). 
2.2 Cognition-action transduction 

Nathan (2017) proposed to expand the GSA framework, citing recent research suggesting 
a bidirectional relation between action and cognition; this new theory is called cognition-action 
transduction. In addition to the hypothesis that mental simulations give rise to gestures, as 
proposed by the GSA framework of Hostetter and Alibali (2008), Nathan (2017) describes 
emerging evidence that the act of gesturing can itself activate mental simulations. In accordance 
with this view, superior problem-solving performance has been demonstrated when students 
follow directions to perform specific actions that correspond to cognitive operations that 
contribute to effective problem-solving strategies (e.g., Goldin-Meadow, Cook, & Mitchell, 
2009; Ginns, Hu, Byrne, & Bobis, 2016; Hu, Ginns, & Bobis, 2015; Nathan et al., 2014; Novack 
et al., 2014). Cognition-action transduction allows that gesture inhibition could be detrimental, as 
people who are inhibited are not able to use gestures as a resource to understand new ideas. 
However, studies where learners are instructed to formulate particular highly-effective gestures 
are different than studies that allow learners to engage in their own personally-generated 
gesturing. Being instructed might be beneficial for low-knowledge learners who lack the 
resources to create their own effective gestures; these low-knowledge learners may actually 
generate misleading gestures illustrating incorrect relationships. However, Nathan et al. (2014) 
found that even purposefully-chosen directed gestures, when not properly understood, could lead 
learners down an incorrect solution path if they did not understand the relationships they were 


intended to be physically representing. Thus, it may only be when the person-environment 
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system offers appropriate feedback that cognition-action transduction would predict that the 
outward processes would be correct and lead to a desired change in cognitive state. 

An alternate view is that gestures or directed actions may focus learners’ attention on the 
specific concrete, spatial qualities of the objects they are physically representing or pointing to, 
therefore learners may not engage in generalization or abstraction — as described by Walkington, 
Nathan, Wolfgram, Alibali and Srisurichan (2014), they develop modal-specific epistemological 
commitments. These are situations where learners are so focused on immediately present, salient 
representations of concepts (like a concrete gesture) that they struggle to transfer this knowledge 
to other representations (Sloutsky, Kaminski, & Heckler, 2005), like a generalized mathematical 
proof. From a cognitive-action transduction standpoint, this could also lead them to give an 
incorrect or incomplete problem solution. 

One way to study whether and how gestures affect cognition is to manipulate gestures 
through gesture inhibition. 

2.3 Prior Gesture Inhibition Studies 

Having learners tap with one hand in particular patterns that periodically change to hinder 
automaticity (spatial tapping) or that involve tapping repeatedly in one place (simple tapping) 
while solving problems has been studied as a method of gesture inhibition. Results from this line 
of research (Hegarty et al., 2005; Nathan & Martinez, 2015) with participants solving problems 
about mechanical and biological systems show that spatial tapping does indeed inhibit problem- 
solving performance (n* = 0.2), while simple tapping does not. Hegarty et al. (2005) also found 
no effect for gesture inhibition through hand restriction. Together, these findings suggest that it is 
not the production of the gestures themselves that impact performance (i.e., not simple tapping or 


gesture inhibition). Instead, it is the demands of redirecting the processes involved in monitoring 


RESTRICTING HAND GESTURES 8 


and executing particular motor sequences through the spatial tapping condition that selectively 
disrupts model-based reasoning, hampering performance on inference-making tasks. However, 
other researchers have examined the effects of hand restriction and found different results. 
Several studies have looked at the impact of hand restriction on speech. Graham and 
Heywood (1975) found that gesture inhibition was associated with students using more spatial 
relation words, fewer demonstrative words, and more time spent pausing. Hostetter, Alibali, and 
Kita (2007) found that participants free to gesture used more semantically rich verbs (d=0.80) 
and were less likely to begin sentences with “and” (d=1.24). A similar study (Rauscher, Krauss, 
and Chen, 1996) found that gesture inhibition was associated with speaking more slowly and 
having more dysfluencies when discussing spatial content but speaking more quickly when 
discussing non-spatial content (ds=0.30-0.58). However, in a study of undergraduates giving 
others instructions on how to perform simple tasks, Hoetjes, Krahmer, and Swerts (2014) found 
no differences between hand-restricted and unrestricted participants on any speech category they 
measured — including speech duration or rate, number of words, pauses, or acoustic measures. 
Other studies have examined the effect of hand restriction on recall. Frick-Horbury and 
Guttentag (1998) found gesture inhibition led to lower retrieval (ds=0.64-1.04) and recall 
(ds=1.32-1.98) of words and that this effect did not vary based on SAT scores. Beattie and 
Coughlan (1999) found that gesture-inhibited students were actually more likely to recall words, 
although this difference did not reach significance (d=0.29). Those who were inhibited were 
significantly less likely to report a “tip of the tongue” state (i.e., where they believed they knew 
the word but could not retrieve it; d=0.60) but were also less likely to be able to resolve this state 


when it happened (d=0.72). 
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Two recall studies have attempted to clarify the mechanisms through which gesturing 
impacts recall. Goldin-Meadow, Nusbaum, Kelly, and Wagner (2001) asked participants to solve 
math problems while keeping words or letters in memory, and found recall was higher when 
gesturing (d=0.35). They also compared instances where the participant was free to gesture but 
chose not to, versus being free to gesture and choosing to gesture, and found similar advantages 
of gesturing for recall. They concluded that gesture allows for cognitive offloading. In a similar 
study (Wagner, Nusbaum, & Goldin-Meadow, 2004) participants were asked to hold either a 
string of letters or a visuospatial configuration of dots in memory. No differences were found in 
speech patterns when participants did versus did not gesture, but not gesturing was associated 
with weaker recall. In addition, not gesturing when uninhibited was associated with negative 
outcomes that were similar to being inhibited from gesturing. The researchers conclude that 
gesturing reduces the load on both visuospatial and verbal working memory. However, they 
found that gesturing was only beneficial when it conveyed information that was also in speech, 
supporting the idea that gesture is beneficial because it helps learners organize information into 
the propositional form needed for speech. These studies are both of limited relevance because 
they do not contrast participants who were inhibited versus non-inhibited. 

Alibali and Kita (2010) examined student explanations of Piagetian conservation tasks 
and found gesture inhibition caused children to express less perceptually present information 
(e.g., the glass is tall versus short) but more non-present information (yn? = 0.16). When 
inhibited, participants were more likely (n? = 0.28) to call upon past events (e.g., “they were the 
same”), make hypotheses about things that might happen (e.g., “if the glass was fatter...”), and 
talk about transformations (e.g., “you moved it over’). When considering mathematical 


justifications, focusing on hypothetical states, past states, and transformations (rather than 
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immediately-present, salient characteristics of mathematical objects) could be desirable (Harel & 
Sowder, 2005). This study points to the need to examine gesture inhibition for tasks like 
mathematical proof. 
2.4 Research purpose 

Prior research has not examined how gesture inhibition impacts mathematical reasoning 
generally, or geometric reasoning specifically, both of which have important visual, spatial, and 
motoric properties, in addition to powerful uses of language as a grounding mechanism (Lakoff 
& Nuifiez, 2000; Nathan, 2014). Mathematical reasoning is more complex than the simple recall 
or descriptive tasks examined in previous studies and has directly-actionable pedagogical 
implications. Geometric reasoning is an especially important area for the examination of gesture 
inhibition, given its spatial nature, use of transformational reasoning, and the prevalence of 
gestures (Nathan & Walkington, 2017; Walkington, Chelule, Woods, & Nathan, in press). Prior 
research has also not often examined whether effects of gesture inhibition vary based on learner 
characteristics, as this may influence students’ ability to formulate their own effective gestures. 

Prior gesture inhibition studies have also often utilized small sample sizes and between- 
subjects comparisons. This makes it difficult to draw firm conclusions or conduct analyses of 
moderation effects, a primary focus here. Many prior studies have defined gestures as being done 
with the hands (e.g., Frick-Horbury & Guttentag, 1998; Goldin-Meadow et al., 2001; Rauscher et 
al., 1996; Wagner et al., 2004) and have not considered the various ways learners can still 
gesture with their hands restrained. Indeed, one study suggests learners with their limbs and 
hands restricted become more likely to gesture with other available body zones (e.g., eyebrows; 
Rimé, Schiaratur, Hupet, & Ghysselinckx, 1984). Finally, advances in computerized text analysis 


of speech patterns (McNamara, Louwerse, Cai, & Graesser, 2013) and the recent identification 
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of dynamic gestures (e.g., Garcia & Infante, 2012) open up new possibilities for how gesture 
inhibition can be studied. The current study addresses each of these gaps. 
2.5 Research questions and hypotheses 

Our research questions (RQs) are: 

1) How do (a) speech patterns, and (b) gesture and dynamic gesture patterns, vary when 
participants are inhibited versus not inhibited from gesturing? 

2) Does gesture inhibition impair recall, intuition, insight, or proof performance on 
geometric tasks, and does inhibition interact with participant characteristics? 

3) How is the presence of gesture and dynamic gesture associated with recall, intuition, 
insight, and proof for only those trials when participants were free to gesture, and how do 
gesture effects interact with participant characteristics? 

We pose four mutually exclusive hypotheses, with each hypothesis leading to different 
specific predictions for each research question. The first is the facilitation hypothesis, which 
posits that personally-generated gestures can causatively help learners. There are several 
conceptually distinct accounts in the literature as to why this might occur. First, gestures might 
reduce cognitive load through cognitive offloading. Second, gestures may allow learners to 
communicate their ideas better verbally, facilitating packaging of ideas into speech and/or 
allowing for lexical retrieval. Third, gestures might serve a transductive purpose by activating 
mental simulations, giving learners new, actionable ideas. 

Second is the interference hypothesis, which posits that gestures do not have facilitative 
properties, but that preventing learners from gesturing in and of itself is an effortful activity that 
increases cognitive load, perhaps due to the novelty or discomfort of being inhibited from 


gesturing. Goldin-Meadow et al. (2001) describe this situation as “the observed effect is not due 
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to the beneficial effects of gesture, but to the deleterious effects of the constraining instructions. 
Asking speakers not to gesture is, in effect, asking them to do yet another task...” (p. 519). 

Third is the byproduct hypothesis, which posits that gestures do not have facilitative 
properties, and that being inhibited from gesturing is not a cognitively effortful activity. This 
hypothesis would view gestures as merely an outgrowth of valid mathematical reasoning, rather 
than a causative factor. This hypothesis posits that gestures tend to co-occur with valid 
mathematical reasoning, without taking a role in causing that reasoning to happen. Fourth is the 
concreteness hypothesis, which posts that gestures may cause learners to focus on currently- 
present, spatial, salient forms of mathematical concepts, and disrupt possibilities for engaging in 
mathematical abstraction or hypothetical or deductive reasoning. 

For RQ1a relating to speech patterns, the facilitation hypothesis would suggest that 
gesture inhibition would change speech patterns (H1a-facilitation), as facilitation in the form of 
offloading or improved retrieval or communication may allow for learners to give longer proofs 
that use more logical statements, describe more operations on objects, or that show more 
generalized or abstract thinking (Harel & Sowder, 2005; see Pier et al., 2019). In addition, a 
transduction effect may show proofs with more action and body-related words, as well as 
increased verb and cohesion measures related to situation models. Finally, when inhibited we 
may see speech changes consistent with prior studies reviewed above, such as fewer 
demonstrative words and semantically rich verbs. 

The interference hypothesis would also posit that gesture inhibition changes speech 
patterns (H1a-interference). When cognitive load is increased, learners may use speech patterns 
that involve shorter proofs, less description of mathematical operations, less abstract or 


generalized language, and more dysfluent words or informal speech. The byproduct hypothesis 
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would posit no relationship between gesture inhibition and speech patterns (H1la-byproduct). The 
concreteness hypothesis would posit that gesture inhibition causes participants to use language 
patterns that involve less concrete and spatial words, and more abstract terms and deductive 
language processes (H1a-concreteness). 

For RQ1b regarding the impact of gesture inhibition on gesture, all four hypotheses 
would assume that gesture inhibition reduces gesture usage (H1b). 

For RQ2 relating to problem-solving performance, the facilitation hypothesis and the 
interference hypothesis would posit that gesture inhibition dampens problem-solving 
performance (H2-facilitation and H2-interference). The byproduct hypothesis would predict no 
relationship between gesture inhibition and performance (H2-byproduct), while the concreteness 
hypothesis would posit that gesture inhibition improves performance (H2-concretenes). 

For RQ3 relating to participant performance while free to gesture, the facilitation 
hypothesis and the byproduct hypothesis would predict that when free to gesture, gestures are 
associated with improved performance (H3-facilitation and H3-byproduct). The interference 
hypothesis would predict no association between gestures and performance when free to gesture 
(H3-interference), while the concreteness hypothesis would predict gesture use to be associated 
with dampened performance when uninhibited (H3-concreteness). The hypotheses and the 


research questions are summarized in Table 1. 


Table 1 


Summary of research questions and hypotheses 


Hypotheses 
Facilitation Interference Byproduct Concreteness 
Inhibited versus not inhibited 


Speech (RQ1a) | (H1la-facilitation) | (Hla-interference) | (Hla-byproduct) | (Hla-concreteness) 
Inhibition may Inhibition may No differences Inhibition may be 
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change language 
patterns, as 
retrieval and/or 
packaging and/or 
communication is 
facilitated, or 
transduction is 
instigated 


change language 
patterns, as 
learners 
experience more 
cognitive load 
(e.g., more 
difficult word 
retrieval) 


associated with 
fewer concrete/ 
spatial words, more 
abstract words, 
more hypothetical 
statements 


Gesture (RQIb) 


(H1b) Less gesture occurs when learners 


are inhibited from gesturing 


Performance (Intuition, 
Insight, Proof; RQ2) 


(H2-facilitation) 
Inhibition is 
harmful 


(H2-interference) 
Inhibition is 
harmful 


(H2-byproduct) 
No differences 


(H2-concreteness) 
Inhibition is 
beneficial 


Choose to gesture versu 


s do not choose to gesture (Uninhibited trials only) 


Performance (Intuition, 
Insight, Proof; RQ3) 


(H3-facilitation) 
Gesture use 
associated with 
higher 


performance 


(H3-interference) 
No differences 


(H3-byproduct) 
Gesture use 
associated with 
higher 
performance 


(H3-concreteness) 
Gesture use 
associated with 
lower performance 


Research Questions 2-3 also allow that these hypotheses may not be uniform across 


participants. For example, offloading might only be beneficial when the learner has a less strong 


background in mathematics and needs to relieve working memory demands for in-the-moment 


problem solving, while the hypothesized verbal support from gestures may only be beneficial 


when the learner has low fluency skills and needs gestures to help retrieve words and package 


ideas into speech. Transduction might only occur when the learner has strong enough spatial 


skills to produce useful gestures to give them new ideas. Interference might only occur when the 


learner has a weak mathematical background and the cognitive system is already overloaded. 


Gestures may only be a byproduct of valid mathematical reasoning when learners have high 


enough spatial skills to formulate mathematical gestures to accompany speech. Participants with 


low spatial skills may be especially likely to fall prey to difficulty generalizing from concrete 


representations. By examining a variety of moderators, we can consider whether differential 


effects might be occurring. 
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Finally, research question 3 also considers differences related to dynamic gestures as a 
special class of gesture that is particularly relevant to mathematical reasoning. We hypothesize 
that the presence of dynamic gestures will have a stronger association with insight and proof 
during uninhibited trials, compared to simply any gesturing being present. This is because 
dynamic gestures show transformations, which are central to the processes involved in 
understanding and generalizing mathematical relationships, which are key to having insights and 
formulating proofs of geometric conjectures. 

3 Methods 
3.1 Participants 

Undergraduate and graduate students (n=108; 48 male and 60 female) from a private 
university were recruited to participate in a laboratory study lasting 30-45 minutes. Math and 
statistics majors and graduate students were specifically targeted through department emails, 
signs, and class visits. Of the 108 participants, 34 were math or statistics majors, 7 were math or 
statistics graduate students, 5 were engineering majors, and 5 were science majors. The 
remaining 57 participants were undergraduate non-STEM majors or undeclared majors. The 
mean age was 20.41 years (SD=2.18). Fifty-five participants had taken a math class above 
calculus 1, 26 had taken up to calculus 1, and 27 had taken below calculus 1. Fifty-eight 
participants identified as Caucasian, 17 as Asian, 12 as African-American, 12 as Hispanic, and 9 
as other races or biracial; 23 participants reported being non-native English speakers. One 
participant (a female math major) was omitted during the coding phase because she was not able 
to speak English well enough to respond to the prompts, for a final sample of 107. 


3.2 Power Analysis 
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An a priori power analysis was conducted with G*Power 3.1.9.2 (Faul, Erdfelder, 
Buchner, & Lang, 2009) using B=0.80 and a=0.05. Based on previous data (Nathan & 
Walkington, 2017), correlations among a participant solving repeated geometry proofs were 
estimated at 0.6. We estimate an effect size of d=0.6 for gesture inhibition on proof performance 
from Walkington et al. (2014), a small pilot study where 15 participants proved a similar set of 
geometry conjectures while inhibited or not inhibited for all trials. This study was used because it 
had the closest outcome variable to the present study (mathematical proof performance); 
however, this effect size was generally consistent with effect sizes in other studies reviewed 
earlier. We additionally took into account being powered to detect mediation effects of gesture 
and speech (assuming partial mediation and small to medium-sized paths for alpha and beta, 
d=0.26-0.39; Fritz & MacKinnon, 2007), which led to a sample size of 86, taking into account 
the design effect. This was our minimum, but we accepted all participants who responded to the 
ads, with the restriction that we wanted to keep the number of non-STEM majors relatively 
balanced with the number of STEM majors. With 107 participants, a post-hoc sensitivity analysis 
showed that we should be able to detect effects for gesture inhibition that are as small in size as 
d=0.2. 

3.3 Procedures 

Participants engaged in a one-on-one session with an interviewer. They were given four 
pre-measures (described later) and presented with 8 geometry conjectures (Table 2). The 
conjectures were ordered via a Latin Square and projected onto a screen one at a time. 
Participants were asked to read each conjecture out loud and state whether each conjecture was 
true or false and why it was true or false. For 4 of the 8 conjectures, participants were inhibited 


from gesturing by putting their hands in oven mitts that were attached to bottles attached to a 
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music stand (Figure 1). Participants were either inhibited for the first four conjectures or the final 
four conjectures; inhibition order was counterbalanced. When they completed all 8 conjectures, 
participants were asked to, while uninhibited, recall as many of the conjectures as possible. All 
interviewer interaction with the participant was scripted, to ensure uniform treatment of inhibited 


versus uninhibited trials. 


Table 2 


Conjectures used in study, with average success rates for proof and insight 


Conjecture Verity | Proof | Insight | Intuition | Recall 
Correct | Correct | Correct | Correct 
1 | An angle bisector of any angle of a False | 6.5% 23.4% | 31.8% 61.68% 
triangle also bisects the opposite side. 
2 | Any translation can be expressed as True 5.6% 23.4% | 43.0% 53.27% 
the composition of two reflections. 
3 | If one angle of a triangle is larger than | True 29.9% | 63.6% | 88.8% 56.07% 
a second angle, then the side opposite 
the first angle is longer than the side 
opposite the second angle. 

4 | The area of a parallelogram is the True 29.0% | 75.7% | 81.3% 70.09% 
same as the area of a rectangle with 
the same length and height. 

5 | The segment that joins the midpoints | True 11.2% | 46.7% | 63.6% 50.47% 
of two sides of any triangle is parallel 
to the third side. 

6 | Any rotation can be expressed as the True | 4.7% 24.3% | 51.4% 54.21% 
composition of two reflections. 
7 | Given that you know the measure of False | 35.5% | 39.3% = | 52.3% 43.93% 
all three angles of a triangle, there is 
only one unique triangle that can be 
formed with these three angle 
measurements. 

8 | If there are three points P, R, and Qin | True 25.2% 43.0% 56.1% 50.47% 
space, and the distance between P and 
Q equals the sum of the distance 
between P and R and the distance 
between R and Q, then points P, R, 
and Q must all lie along the same line. 
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Figure 1. Gesture inhibition rig 


3.4 Measures 

Geometry Pretest. Participants were given a geometry knowledge pretest, developed in a 
prior study (Nathan & Walkington, 2017; 7=0.56 with performance on conjectures similar to 
those considered here), composed of twelve statements about triangles, parallelograms, and 
circles. Although the pretest had a moderate correlation with student performance on the 
conjectures in the present study (r = 0.41 with proof), it was ultimately not used in the models 
because of issues with internal consistency. Results for gesture inhibition were the same with or 
without pretest included, and pretest did not significantly interact with gesture inhibition in any 
model. 

Spatial Skills Test. Participants were given the Paper Folding Test, from The Kit of 
Factor-Referenced Cognitive Tests (Ekstrom, French, Harman, & Derman, 1976), which 


measures participant’s ability to visualize and manipulate images. Scores are calculated as the 
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number of items correct minus one-quarter of the number incorrect. Reliability for this measure 
is 0.75 for males and 0.77 for females. 

Phonemic Fluency Test. The phonemic fluency measure tests speakers’ ability to manage 
the organizational demands of speaking by rapidly generating words in a way that they do not 
typically organize them in their lexicon; here, by naming as many words as they can in 60 
seconds that begin with the letter ‘s’ and then the letter ‘t.” Score is the number of words 
generated in 60 seconds (omitting proper nouns and simple variants). Retest reliability is 0.88 
(desRosiers & Kavanagh, 1987). For the present data we examined the correlation between the 
count of words each participant generated beginning with “t” and the count of words each 
participant generated beginning with “s.” The correlation for these values (including word 
variants) was 0.73. Approximately 19% of the words participants generated for this test were 
simple variants of words they had already generated or proper nouns (19.7% for “s” and 18.6% 
for “t’). The mean (14.8) and standard deviation (4.8) for the phonemic fluency measure here 
was similar to other studies that have used this measure (Hostettar & Alibali, 2007; Yeudall, 
Fromm, Reddon, & Stefanyk, 1986). 

Demographic Questionnaire. A paper questionnaire asked participants to identify their 
class/year, race/ethnicity, Math ACT/SAT scores, college major(s), age, gender, native language, 
and prior and current math courses taken. ACT/SAT scores were not used as 42 of the 108 
responses were missing/ambiguous. 

3.5 Coding 
Participants were video-recorded and their speech was transcribed in the Transana video 


analysis software (Woods & Fassnacht, 2012). Individual clips were made of each participant 
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proving each conjecture for a total of 856 clips. One STEM major had his hands off-camera 
while uninhibited, thus his uninhibited data is omitted from gesture analyses. 

Coding Proof. Participants’ oral proofs for each conjecture were scored 0/1 in terms of 
correct or incorrect using a codebook (see Appendix A), which was developed from the criteria 
for valid mathematical proofs given by Harel and Sowder (2005). Cohen’s Kappa reliability of 
0.81 was achieved for 100 randomly-selected double-coded clips. 

Coding of Insight. Given the relatively low rate of valid proofs, a complementary 
measure (insight) was included to assess whether participants demonstrated understanding of 
some of the key ideas, without necessarily getting all the way to a full deductive argument. 
Insight was defined according to Zhang, Lei, and Li (2016) as conscious retrieval of activated 
mathematical properties and examples that are both validly applied and relevant to the conjecture 
at hand. See Appendix A for how this was operationalized. Cohen’s Kappa reliability of 0.80 
was achieved for 100 randomly-selected double-coded clips. 

Coding of Intuition. Participants’ justifications were also coded for whether they correctly 
concluded the conjecture was true or false. If the participant changed their answer, only the final 
answer was considered for the coding. One hundred clips were randomly selected from the 
corpus for independent double-coding; a Cohen’s Kappa reliability of 0.97 was achieved. 

Coding of Recall. Participants were asked at the end of the session to repeat as many of 
the conjectures as they could remember. If the participant recalled a conjecture, they received a 
code of “1”, otherwise they received a code of “0.” Cohen’s kappa of 0.96 was achieved for 100 
randomly-selected double-coded clips. 

Coding of Gesture. Each clip was coded for whether the participant made (1) any gesture 


at all, which represented, pointed to, or indicated movement or transformation of, mathematical 
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objects (e.g., sweeping the head left and upwards when indicating the upper left corner of a 
quadrilateral) or (2) any dynamic gesture, where participants showed a movement-based 
transformation of an object through multiple states (e.g., showing a translation of a mathematical 
figure by tilting the head or moving the hands). These categories were 0/1 variables, and gestures 
could fall into multiple categories. Reliability for inhibited versus uninhibited trials was 
calculated separately, due to the increased challenge of coding inhibited trials and differing 
gesture rates. Kappas of 0.90 (any gesture) and 0.87 (dynamic gesture) were achieved between 
two coders for the uninhibited trials on a random subset of 50 clips, while kappas of 0.81 (any 
gesture) and 0.81 (dynamic gesture) were achieved between two coders for the inhibited trials on 
a random subset of 50 clips. 

Prior gesture inhibition studies have largely defined gestures in terms of hand 
movements, thus this coding represents a methodological expansion. Participants inhibited from 
gesturing would use their head to point to imagined figures in front of them or to show an 
object’s movement. In addition, although not used in statistical analyses because of their rarity, 
we also looked at how often participants attempted hand gestures when gestures were inhibited. 
Occasionally the camera feed would capture slight but visible movements indicating participants 
were moving their fingers inside the oven mitts, that seemed to be related to their mathematical 
reasoning. We found that while hand gestures occurred in 62.74% of all trials while uninhibited, 
they only occurred in 4.45% of inhibited trials. 

Coding of Speech. In order to examine the differences in speech patterns proposed in our 
hypotheses, the transcript of each participant’s speech was entered into two text analysis 
software packages, Coh-metrix and LIWC. Coh-metrix (McNamara et al., 2013) codes texts 


based on 108 categories, which range from the number of words per sentence to the average 
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concreteness and age of acquisition of words. Prior research has found that Coh-metrix can 
distinguish important elements of mathematical arguments like using connective words, actions 
on mathematical objects, and deductive statements, as well as progressing through a logical 
structure through sentence overlap (Nathan et al., 2018; Pier et al., 2019). LIWC (Pennebaker, 
Chung, Ireland, Gonzales, & Booth, 2007) is a dictionary-based computerized text analysis tool 
that counts the number of words occurring in the text in 70 categories. Although not all LIWC 
categories are relevant, many categories (like cognitive process words) have been found to be 
important in prior investigation of oral proofs (Pier et al., 2019). All language variables tested are 
listed in Appendix B. 

As indicated by our hypotheses (Table 1), we only investigated speech differences for 
inhibited versus uninhibited trials. How speech categories relating to mathematical proof 
practices change when learners choose to gesture versus not gesture has been examined 
extensively in other work (see Pier et al., 2019). By comparing the coded speech categories used 
during inhibited trials to uninhibited trials, we can see if there were significant differences in the 
kinds of words, phrases, and patterns of speech participants used that reflected the differences we 
proposed in our hypotheses. 

3.6 Analysis 

Our hypotheses (Table 1) relate to three comparisons — (1) participants’ performance on 
inhibited versus uninhibited trials, (2) participants’ speech patterns in inhibited versus 
uninhibited trials, and (3) participants’ performance when they chose to gesture versus not 
gesture (when uninhibited). For the first comparison, we modelled performance as an outcome 
with inhibited/uninhibited as a predictor. For the second comparison, we modelled our 


quantitatively coded speech categories from Coh-Metrix and LIWC as an outcome, with 
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inhibited/uninhibited as a predictor. And for the third comparison, we modelled performance as 
an outcome (including only the uninhibited trials), with gesture/no gesture as a predictor. 

For Research Question la, as an initial screening step, we first removed from 
consideration language categories in Coh-Metrix and LIWC that (1) were intended for text that 
and clear sentence delineations (rather than natural speech; e.g., number of sentences), (2) related 
to use of punctuation (e.g., incidence of semicolons), and (3) that were “0” (i.e., not present) for 
75% or more of the data points. This left us with 53 different categories from LIWC and 95 
different categories from Coh-Metrix (see Appendix B). All variables were continuous with the 
exception of word count. We then fit mixed effects linear models (with student and conjecture as 
random effects) predicting each language category with inhibition condition as a predictor. We 
used a cluster bootstrap to estimate the standard error of the regression coefficients. The cluster 
bootstrap was used because it can deal with issues of non-normality and heteroscedasticity, and 
works for data that have a lot of “0” values, as is typical for readability variables. Bootstrap 
samples were drawn by sampling individuals rather than observations, thus taking into account 
the nesting of observations within participant. We implemented the cluster bootstrap procedure 
by (i) generating 1000 bootstrap samples using the ClusterBootstrap library in R, (11) applying 
the linear mixed effects models to each bootstrap sample, and (111) computing the regression 
coefficients’ standard errors from the bootstrap distribution. We applied the False Discovery Rate 
p-value correction to the significance tests for the regression coefficients (Benjamini & 
Hochberg, 1995). 

For Research Question 1b, mixed effects logistic regression models were fit using the 
glmer() command (Bates, Maechler, Bolker, & Walker, 2014) in the R software package. Models 


predicted any gesture (coded 0/1) and dynamic gesture (coded 0/1), with participant, conjecture, 
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and which of the 8 Latin square orders they received as random effects. Fixed effects included 
whether the participant was inhibited for the conjecture, demographic variables (gender, 
language), and expertise variables (geometry pre-test score, spatial score, phonemic fluency 
score, STEM/non-STEM major, highest math course taken). 

For Research Question 2, similar models were fit predicting correct proof (coded 0/1) and 
insight (coded 0/1). For the recall model only, an additional fixed effect was added to identify in 
what order during the session (1-8, a factor variable) participants had received the conjecture. 
We first fit main effects models examining the impact of gesture inhibition, and then examined 
two-way interactions between gesture inhibition and other fixed effects. Interactions were only 
retained if they significantly reduced deviance using the anova() command in R. The regression 
tables present raw coefficients that can be exponentiated to get odds ratios. Standardized mean 
difference-type effect sizes were calculated using the method in Chinn (2000). 

For Research Question 3, the analysis process given above for Research Question 2 was 
repeated for the subset where participants were uninhibited (n = 428 trials). Additionally, 


dynamic gesture and any gesture were added as predictors. 


4 Results 
Table 3 gives descriptive statistics for our measures. As can be seen from the table, our 
four performance measures (intuition, recall, insight, and proof) had different average accuracy 
levels — 58.5%, 55.0%, 42.4%, and 18.5%, respectively. As we coded the validity of participants’ 
mathematical reasoning at a variety of different levels of task difficulty, we offset concerns of 
either having ceiling or floor effects. We also conducted supplementary analyses where we only 


examined the easier or more difficult conjectures, and results were the same. 


Table 3 


Average rates of correctness and gesture incidence 
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Measure All Participants 
(N = 107) 
Geometry Pre-Test Mean (SD) 80.61% (12.26%) 
Spatial Skills Mean (SD) 12.20 (4.70) 
Phonemic Fluency Mean (SD) 14.87 (3.84) 
Inhibited (428 trials) Not (428 trials) 

% Trials Correct Proof 18.22% 18.69% 
% Trials Correct Insight 43.46% 41.36% 
% Trials Correct Intuition 59.11% 57.94% 
% Trials Correct Recall 55.61% 54.44% 
% Trials Any Gesture 29.67% 69.81% 
% Trials Dynamic Gesture 10.75% 32.78% 


4.1 RQ1: Association between gesture inhibition and speech and gesture 


To examine how speech patterns varied when participants were inhibited versus not 


inhibited, we used a cluster bootstrap with mixed effects regression models to assess whether 


there were differences between uninhibited and inhibited trials according to the 148 different 


language measures. None were significant (consistent with Hla-byproduct). See Appendix C for 


information on the regression coefficients and significance tests. Appendix B also includes 


Cohen’s d values that show the effect size of gesture inhibition for each speech category — d 


values were less than or equal to 0.3, with only three categories (past tense, positive emotion, and 


affect) having a d value greater than 0.2. 


Participants inhibited from gesturing were less likely to make any gesture (Odds=0.09, 


d=-1.32, p<.001; consistent with H1b) and dynamic gestures (Odds=0.14, d=1.07, p<.001). 


When both including and removing the gesture inhibition variable, there were no instances where 


another fixed effect predicted gesture. Examining factors predicting gestures during only the 
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uninhibited trials yielded only the significant effect that students who had taken calculus 2 or a 
higher math course were significantly more likely to make dynamic gestures than those who had 
taken only calculus 1 (Odds = 3.36, d= 0.67, p = 0.034) or those who had not taken calculus 
(Odds = 3.55, d = 0.70, p = 0.036). 
4.2 RQ 2: Association between gesture inhibition and outcomes 

For the recall model (Model | in Table 4), gesture inhibition had no effect (p=0.62, 
d=0.04). The only fixed effect predictive of recall was course taking, with participants whose 
highest math class was below calculus | less likely to recall a conjecture than participants above 
calculus 1 (Odds=0.46, d =-0.43, p=0.020). The order in which participants received conjectures 
was highly significant but is not shown in the table for brevity. For the intuition model (Model 
2), gesture inhibition had no effect (p=0.77, d=0.02). No other fixed effects predicted intuition. 
Models | and 2 were re-fit examining whether gesture inhibition interacted with each of the 


other variables. No interaction terms were statistically significant. 


Table 4 


Models predicting recall and intuition 


Model 1: Main Model 2: Main Effects 

Effects Recall Intuition 
Fixed Effects B (SE)S2 B (SE)Si8 
(Intercept) -0.39 (0.39) 0.54 (0.41) 
Gestures Not Inhibited (ref.) (ref.) 
Gestures Inhibited 0.08 (0.16) 0.05 (0.15) 
Female (ref.) (ref.) 
Male 0.03 (0.21) 0.18 (0.19) 
Native English (ref.) (ref.) 
Non-Native English 0.06 (0.28) -0.06 (0.25) 
Non-STEM major (ref.) (ref.) 
STEM major -0.31 (0.30) -0.02 (0.26) 
Phonemic Fluency 0.05 (0.03) -0.01 (0.03) 
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Course Beyond Calc I (ref.) (ref.) 

Course Below Calc 1 -0.78 (0.34)* -0.31 (0.31) 
Course Calc 1 -0.54 (0.31) -0.39 (0.27) 
Spatial Score 0.02 (0.02) 0.02 (0.02) 


Note. (ref.) denotes the reference category. *= p<.05 


In the main effects model predicting insight (Model 3 in Table 5), spatial test score was 
significantly positively associated with correct insights (d=0.034 per | point on test, p=.014). 
Being male was also associated with a greater likelihood of correct insight (Odds=1.63, d=0.27, 
p=0.024), as was taking a math course beyond calculus 1, rather than no calculus (Odds = 0.49, 
d=-0.40, p = 0.044) or calculus 1 only (Odds = 0.51, d= -0.37, p = 0.034). There was no main 
effect for gesture inhibition (p=0.44, d=0.07). 

In the main effects model predicting proof (Model 4), spatial test score was significantly 
positively associated with correct proofs (d=0.07 per 1 point on test, p=.001). Being male was 
also associated with a greater likelihood of correct proof (Odds=1.89, d=0.35, p=0.023). Having 
a highest math course above calculus | was associated with a higher likelihood of correct proof, 
compared to only calculus 1, (Odds=2.96, d=0.60, p=0.011), and below calculus 1 (Odds = 2.81, 
d = 0.57, p = 0.030). There was no main effect for gesture inhibition (p=0.88, d=0.02). Models 3 
and 4 were re-fit examining whether gesture inhibition interacted with each of the other 
variables. No interaction terms were statistically significant (ps>.05). 

In sum, gesture inhibition did not predict any of the performance measures, consistent 


with H2-byproduct. 


Table 5 


Models predicting insight and proof 
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Model 3: Main Model 4: Main 
Effects Insight Effects Proof 
B (SE)Si2 B (SE)Si2 

(Intercept) -0.24 (0.45) -2.02 (0.56)*** 
Gestures Not Inhibited (ref.) (ref.) 
Gestures Inhibited 0.12 (0.16) 0.03 (0.21) 
Female (ref.) (ref.) 
Male 0.49 (0.22)* 0.64 (0.28)* 
Native English (ref.) (ref.) 
Non-Native English -0.57 (0.29) -0.38 (0.37) 
Non-STEM major (ref.) (ref.) 
STEM major 0.06 (0.30) 0.23 (0.39) 
Phonemic Fluency Score 0.01 (0.03) -0.02 (0.04) 
Course Beyond Calc I (ref.) (ref.) 
Course Below Calc 1 -0.72 (0.36)* -1.04 (0.48)* 
Course Calc 1 -0.67 (0.32)* -1.08 (0.42)* 
Spatial Score 0.06 (0.02)* 0.12 (0.04)** 


Note. (ref.) denotes the reference category. * = p < .05, ** =p < .01, *** =p < .001 


4.3 RQ3: Association between gesture and outcomes when uninhibited 


We next considered only trials where participants were free to gesture (Table 6). For the 
models predicting recall and intuition, neither gesture type (any gesture, dynamic gesture) was 
associated with outcomes. For the models predicting insight, dynamic gestures were associated 
with valid insights (Odds=2.27, d=0.45, p=0.014), but any gesture was not (p=0.16, d=0.21). 
None of the other predictors had a significant interaction with dynamic gestures. For the models 
predicting proof, dynamic gestures were strongly associated with valid proofs (Odds=3.81, 
d=0.74, p<.001). Any gesture was not associated with valid proofs (p=0.45, d=0.14). None of the 
other predictors had a significant interaction with dynamic gestures. Being male was associated 
with a higher likelihood of insight and proof across models, while being a non-native English 


speaker was associated with a lower likelihood of insight. For all four models, spatial test score 
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significantly positively predicted valid proofs and insights. For the dynamic gesture proof model, 
being a STEM major significantly positively predicted valid proofs. 

As there were positive results for gestures predicting performance during uninhibited 
trials, these results are consistent with H3-byproduct. We would also expect this effect under H3- 


facilitation, but the results from RQ2 rule out facilitation as being the appropriate hypothesis. 


Table 6 


Models predicting insight and proof, with gesture as predictor 


Model 5: Model 6: Model 5: Any Model 6: 
Any Gesture Dynamic Gesture Proof Dynamic 
Insight Gesture Gesture Proof 
Insight 

Fixed Effects B (SE)Si2 B (SE)Si2 B (SE)Si2 B (SE)S2 
(Intercept) -0.74 (0.57) -0.84 (0.58) -2.79 (0.75) -3.40 (0.79) 
Female (ref.) (ref.) (ref.) (ref.) 
Male 0.66 (0.26)* | 0.70 (0.27)** 0.91 (0.34)** 1.02 (0.35)** 
Native English (ref.) (ref.) (ref.) (ref.) 
Non-Native -0.87 (0.36)* | -0.81 (0.36)* -0.30 (0.44) -0.18 (0.44) 
English 
Non-STEM major (ref.) (ref.) (ref.) (ref.) 
STEM major 0.27 (0.36) 0.36 (0.36) 0.80 (0.49) 1.01 (0.51)* 
Phonemic Fluency | 0.03 (0.03) 0.03 (0.03) 0 (0.04) -0.02 (0.04) 
Course Beyond (ref.) (ref.) (ref.) (ref.) 
Calc I 
Course Below -0.54 (0.41) -0.48 (0.41) -0.62 (0.58) -0.13 (0.54) 
Calc 1 
Course Calc 1 -0.41 (0.38) -0.33 (0.38) -1.10 (0.53)* -0.75 (0.52) 
Spatial Score 0.06 (0.03)* 0.06 (0.03)* 0.12 (0.05)* 0.13 (0.05)** 
Any Gesture 0.38 (0.27) 0.26 (0.34) 
Dynamic Gesture 0.82 (0.33)* 1.34 (0.40)*** 


Note. (ref.) denotes the reference category. * =p < .05, ** =p < .01, *** =p < .001 


5 Discussion 
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Our results support our third hypothesis, the byproduct hypothesis (Table 1; H1a- 
byproduct, H1b, H2-byproduct, H3-byproduct). Gesture inhibition had no significant effect on a 
variety of outcome measures — including speech patterns, recall, and giving valid intuitions, 
insights, and proofs for geometry conjectures. Analyses of interaction effects suggested that 
inhibition had no significant effect regardless of gender, language status, spatial skills, phonemic 
fluency, college major, or math course-taking history. However, gesturing, particularly making 
dynamic gestures, was associated with improved insight and proof, and inhibiting gestures via 
our gesture inhibition rig dramatically reduced tendency to gesture. How can these results be 
reconciled? 

An explanation for these findings is that gesture is merely a byproduct of — rather than a 
causative factor in — valid geometric proof construction. In other words, college students in the 
study who tended to do better at these geometry tasks also tended to gesture more and to gesture 
in certain ways. But their gestures may not actually be influencing or causing their valid 
reasoning. If this was the case, when inhibited from gesturing, we would expect them to see no 
ill effect, since their gestures may have been simply an outgrowth of valid reasoning that was 
already established and that would have taken place with or without gesture. 

This interpretation runs counter to other accounts in the literature — including the GSA 
framework (Hostetter & Alibali. 2008), which posits that gestures can relieve cognitive load, 
and/or that inhibiting gestures may increase cognitive load. There are a large number of other 
studies suggesting that gesture inhibition impacts a variety of outcomes, from language to recall 
to problem-solving performance (Alibali & Kita, 2006; Beattie & Coughlan, 1999; Frick- 
Horbury & Guttentag, 1998; Goldin-Meadow et al., 2001; Hostetter et al., 2007; Nathan & 


Martinez, 2015; Rauscher et al., 1996; Wagner et al., 2004). However, these studies had varied 
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populations, inhibition methods, sample sizes, and content areas, and these methodological 
details may be important in explaining the differing results. The difference between child versus 
young adult cognitive processing may be particularly important. In addition, this study directly 
addresses a number of correlational studies that link more gesturing or gesturing in certain ways 
to better outcomes for mathematical activities (Gerofsky, 2010; Pier et al., 2019; Walkington et 
al., 2014; Nathan & Walkington, 2017). This effect, while reliably detected across studies and 
relatively large in size, may not be a causative relationship, and thus may not be a useful 
malleable factor to consider intervening upon. 

Theories of cognition-action transduction posit that gestures can give learners new ideas 
(Nathan, 2017), and this hypothesis is supported by studies suggesting a positive effect for 
directing gestures on mathematical outcomes (Goldin-Meadow et al., 2009; Novack et al., 2014). 
While directing participants to gesture in specific ways that are known to be effective may have a 
transductive impact on cognitive states, this study suggests that for young adults in mathematics, 
it may not be the case that personally-generated gestures have this impact. Rather than our 
learners’ gestures giving them new actionable ideas about geometric proofs, the present study 
suggests that these gestures may simply illustrate reasoning that would be happening one way or 
another. In order for transduction to reliably take place in a study like the present one, gestures 
may need to be explicitly directed by an outside agent or through structuring of the environment. 
Studies that come to conclusions that gesturing improves learning should be careful to qualify 
how those gestures were directed or structured to come about, to build a clear evidence base. 

In addition, our results support the hypothesis that dynamic gestures tend to be 
significantly associated with insight and proof whereas a general category of “any gesture” is 


not, indicating dynamic gestures’ important association with valid geometric reasoning. Dynamic 
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gestures show transformations of mathematical objects through multiple states, and thus may 
signal learners’ understanding of geometric relations. This is consistent with predictions about 
dynamic gestures given in Nathan and Walkington (2017). However, this again may not be a 
causative relationship. We also found that gestures and dynamic gestures did not show a 
significant association with recall or intuition. This is consistent with Nathan and Martinez 
(2015), who found spatial tapping selectively impaired making inferences from reading science 
text, but showed no difference on textbase recall or performance. 

Limitations and Future Directions 

Although it is difficult to extend implications for age groups beyond the one examined in 
this study (college students at a selective university), among a population similar to ours, this 
type of gesture inhibition may not be detrimental to students’ reasoning on geometric proof 
tasks. Despite the fact that we used a college student population, note that the tasks we gave 
participants were challenging enough for them, as demonstrated by the low success rates on our 
most stringent measure of mathematical reasoning (proof). 

In addition, although studies reviewed here did involve coding of gestures, most looked 
only at hand gestures. Here we took a broader view of what constitutes a gesture, as participants 
seemed able to point to and represent the movement of mathematical objects with their heads 
rather than their hands. Whether head gestures would serve the same purpose in other content 
domains is less clear -- perhaps they are more useful in geometric reasoning. While the studies 
that involve spatial tapping do have some clear methodological limitations, one of the major 
reasons this method of gesture inhibition is used is because it may be more likely to inhibit all 
gestures — not just hand gestures. However, it is important to note that even with our liberal 


definition of gestures, we did see gesture rates fall dramatically with our inhibition method, with 
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no accompanying significant change in any performance or language measure. It is also worth 
mentioning that participants could have been making “micro movements” inside the gloves or 
with other body parts that were too subtle to detect using visual means. The issue of micro 
movements is a limitation of all gesture inhibition studies that use a physical form of hand 
restriction. Despite this issue, most prior gesture inhibition studies using hand restriction have 
found significant differences. 

A number of interventions for mathematics learning have been developed that direct 
learners to gesture in particular ways (e.g., Agostinho et al., 2015; Ginns et al., 2016; Goldin- 
Meadow et al., 2009; Hu et al., 2015; Nathan & Walkington, 2017; Ottmar & Landy, 2017; 
Petrick & Martin, 2012; Smith, King, & Hoyte, 2014). These interventions may be effective 
because they give learners well-thought-out gestural schemas to use that are more effective than 
the gestures that learners would use if left to their own devices. In other words, having learners 
simply gesture more may not be a particularly beneficial path — in order for gesture to give 
learners new ideas about geometry, learners may need to be taught to adopt specific types of 
gestures that are specially designed to demonstrate and embody powerful ideas and relationships. 
This may hold regardless of the level of mathematical expertise of the learner. 

An important question is, how can these “effective” gestures be selected, and then how 
can they be passed on to learners in personally meaningful ways? In addition, to what degree do 
learners need to actually be directed to produce specific gestures? Research by Abrahamson and 
Trninic (2015) on the development of proportional reasoning using the body does not direct 
learners to do particular gestures, but rather creates an environment with a “field of promoted 
action” that loosely encourages particular kinds of body movement through feedback and 


interaction. These gestures are developed spontaneously and are personal, while at the same time 
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are structured by the environment. An alternative approach is illustrated in a video game for 
promoting geometry reasoning developed by Nathan and Walkington (2017) that elicits directed 
actions by having players mimic the actions performed by in-game avatars. Other interventions 
that give the learner highly specific instructions about tracing relational parts of geometric 
figures have also been found to be effective (Ginns et al, 2015; Hu et al., 2015). If gesture is 
going to be leveraged to play a causative role mathematical reasoning, these varied forms of 
intervention are important considerations to be addressed by future work. 
6 Conclusions 

Prior research has suggested somewhat uniform, detrimental effects for gesture 
inhibition; however, we discovered that personally-generated gestures may not play a causative 
role in geometric reasoning, supporting the gestures as a byproduct hypothesis. These results 
call into question a long line of studies suggesting detrimental effects for gesture inhibition. They 
also problematize theories that suggest that gesture can play a causative role in supporting 
learners’ reasoning. However, this is one of the only studies looking at the effects of inhibition 
on mathematical reasoning, in the area of high school level geometry. This suggests that the 
causative role of gesture in promoting changes in language, recall, and reasoning might be 
different in other domains and developmental levels. While mathematical ideas are inherently 
embodied and perceptual (Lakoff & Nufiez, 2000), it may be challenging for learners to 
spontaneously and meaningfully connect the embodied roots of mathematical ideas to the 
abstractions, definitions, and theorems they encounter in the typical classroom. What we tested 
here were highly academic tasks, firmly situated in the system of “school mathematics.” In the 
mathematics classroom, learners are accustomed to only expressing their mathematical reasoning 


via written notation, rather than oral language accompanied by action. The ways in which 
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learners use gestures and their bodies is certainly influenced by this overarching system of norms 
and beliefs in mathematics, and learners might need to be instructed upon particular gestural 


schemas to bridge this divide and realize the power of gestures. 
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